[torch.compile] Use FakeTensors instead of real GPU tensors for single-size compilation#36093
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces an optimization to the torch.compile integration by using FakeTensors instead of real GPU tensors during single-size compilation. This avoids unnecessary GPU memory allocation. The changes are implemented in two parts: create_concrete_args is updated to generate FakeTensors, and InductorStandaloneAdaptor.compile is patched to handle these tensors correctly by reusing the FakeTensorMode, which also serves as a workaround for an upstream PyTorch issue. The implementation is clean, well-commented, and the logic appears sound. I have no major concerns with this change.
6f5635a to
bcb9e5b
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
|
looks good. please resolve merge conflicts. |
bcb9e5b to
d19ac4a
Compare
create_concrete_args previously allocated real GPU tensors (via
torch.empty) just to carry shape/stride/dtype/device metadata into
standalone_compile. Switch to FakeTensors under a FakeTensorMode with
a dummy ShapeEnv. (dummy ShapeEnv instead of None is needed to keep
AOTAutogradCache happy)
standalone_compile("from_example_inputs") creates its own FakeTensorMode
internally, which would conflict with our FakeTensors. Work around this
by patching FakeTensorMode in standalone_compile to reuse our mode.
Tracked upstream: pytorch/pytorch#176562
Signed-off-by: Richard Zou <zou3519@gmail.com>
17d977c to
7c46bdb
Compare
…e-size compilation (vllm-project#36093) Signed-off-by: Richard Zou <zou3519@gmail.com>
…e-size compilation (vllm-project#36093) Signed-off-by: Richard Zou <zou3519@gmail.com>
…e-size compilation (vllm-project#36093) Signed-off-by: Richard Zou <zou3519@gmail.com>
create_concrete_args previously allocated real GPU tensors (via torch.empty) just to carry shape/stride/dtype/device metadata into standalone_compile. Switch to FakeTensors under a FakeTensorMode with a dummy ShapeEnv. (dummy ShapeEnv instead of None is needed to keep AOTAutogradCache happy)
standalone_compile("from_example_inputs") creates its own FakeTensorMode internally, which would conflict with our FakeTensors. Work around this by patching FakeTensorMode in standalone_compile to reuse our mode. Tracked upstream: pytorch/pytorch#176562